ACL 2012 50 th Annual Meeting of the Association for Computational Linguistics

نویسندگان

Kentaro Inui

Greg Kondrak

Jackie C. K. Cheung

Carlos Henriquez

چکیده

Several approaches have been proposed for the automatic acquisition of multiword expressions from corpora. However, there is no agreement about which of them presents the best cost-benefit ratio, as they have been evaluated on distinct datasets and/or languages. To address this issue, we investigate these techniques analysing the following dimensions: expression type (compound nouns, phrasal verbs), language (English, French) and corpus size. Results show that these techniques tend to extract similar candidate lists with high recall (∼ 80%) for nominals and high precision (∼ 70%) for verbals. The use of association measures for candidate filtering is useful but some of them are more onerous and not significantly better than raw counts. We finish with an evaluation of flexibility and an indication of which technique is recommended for each languagetype-size context.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

ACL - 08 : HLT 46 th Annual Meeting of the Association for Computational Linguistics : Human Language Technologies

متن کامل

ISBN : 978 - 1 - 61738 - 808 - 8 48 th Annual Meeting of the Association for Computational Linguistics 2010 ( ACL 2010 ) Uppsala , Sweden 11 - 16 July 2010

متن کامل

th Annual Meeting of the Association for Computational

The ACL Anthology Network (AAN)1 is a comprehensive manually curated networked database of citations and collaborations in the field of Computational Linguistics. Each citation edge in AAN is associated with one or more citing sentences. A citing sentence is one that appears in a scientific article and contains an explicit reference to another article. In this paper, we shed the light on the us...

متن کامل